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The paper is concerned with distributed learning in large-scale games. The well-known fictitious 
play (FP) algorithm is addressed, which, despite theoretical convergence results, is impractical in large 
settings due to intense computation and communication requirements. An adaptation of the FP algorithm, 
designated as the empirical centroid fictitious play (ECFP), is presented. In ECFP players respond to the 
centroid of all players' actions rather than track and respond to the individual actions of every player. 
It is shown that ECFP is well suited for large-scale games and convergence of the algorithm in terms 
of average empirical frequency (a notion made precise in the paper) to a subset of the Nash equilibria, 
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£> \ designated as the consensus equilibria, is proven under the assumption that the game is a potential 

algorithm is presented in which players, endowed with a (possibly sparse) preassigned communication 
Convergence results are proven for the distributed ECFP algorithm. It is also shown that the methodology 
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I. INTRODUCTION 

We consider a scenario where n players are engaged in repeated play of a finite strategic form 
game, and player-to-player communication is restricted to a local neighborhood (defined according to 
a communication graph, possibly sparse) of each agent. The question of interest is, can agent behavior 
rules be assigned which ensure agents learn a Nash equilibrium (NE) strategy yet are practical in a 
large-scale setting? We focus on the well-known fictitious play (FP) algorithm. Originally introduced as 
a method to compute NE in two player games JU, FP has since been studied extensively to determine 
the class of games for which it can be shown to converge^ While the algorithm does not converge 
in all games (121, 0,111), positive convergence results have been proven for certain classes of games 
(IH, 01, Q, flU). Of particular relevance to this paper is a result demonstrating convergence of FP in 
games with an arbitrarily large number of players under the assumption of identical interests Q. This 
theoretically promising result suggests that FP might be an ideal algorithm for some large-scale settings; 
however, the prohibitively demanding communication and computational requirements of the algorithm 
make any large-scale implementation highly impractical. 

We suggest that a learning algorithm satisfy the following two criteria to be considered well-suited to 
large-scale games: (1) The algorithm is computationally tractable for individual agents; (2) The algorithm 
admits a distributed implementation^ 

FP does not satisfy either criteria. Players executing the FP algorithm must track the past actions 
(empirical distribution) of all other players (violating 2) and respond to this empirical distribution by 
solving an optimization problem which is exponentially complex in terms of the number of players 
(violating 1). 

The main objective of this paper (see also J9j) is to propose an adaptation of the FP algorithm that is 
well suited to learning NE in large-scale games. We divide our approach into two parts. First we propose an 
adaptation of FP where players respond to the centroid of the marginal empirical distributions (the centroid 
distribution), rather than track and respond to the entire tuple of independent marginal distributions. We 
call this algorithm empirical centroid fictitious play (ECFP). The advantages of this approach are a 
reduction in computational complexity resultant from the inherent symmetry, and a dramatic mitigation 
of the FP information tracking problem by requiring players to track only a single vector (the centroid 

'in reference to FP or one of its variants, we use the term convergence to mean that the empirical frequency distribution of 
a FP process asymptotically converges to the set of Nash equilibria, a notion to be made precise in section [TT| 

2 By a distributed implementation we mean that inter-agent information exchange may be restricted to a (preassigned) local 
neighborhood of each agent. 



distribution) which is invariant to the number of players in the game. ECFP is of independent interest but is 
not by itself suited to large-scale games since it violates the distributed criterion given above by requiring 
precise knowledge of the centroid distribution, information unattainable without global knowledge. 

It is interesting to interpret this approach in light of the highly structured nature of large-scale games. 
To fully describe the utility function of a single agent in a game where all players have m actions, it 
is necessary to specify m n payoff values, an enormous number of parameters. However, it has been 
observed that 'rare is the game that models an application of interest and yet lacks sufficient structure 
to be specified with a reasonable number of parameters' iTTOl . Example large-scale games that admit a 
compact representation include congestion games ifTTI . symmetric games |[T2l . anonymous games |[T3l , 
and graphical games lfl4l . Given that large-scale games of interest tend to have a compact representation, 
the general question arises as to whether a learning algorithm applied to a highly structured large-scale 
game might not equilibrate using a compact representation of player behavior. ECFP can be seen as an 
initial foray into this domain, for which we have gained some positive results. 

The second part of our approach is to assume players are endowed with an ancillary communication 
structure through which they can exchange any information they desire. We represent this structure using 
a graph G = (V, E) where a vertex represents a player, and an edge represents the ability of a player to 
exchange information with a neighboring player. Note that in the majority of game theory literature, a 
graph structure denotes the ability of a player to observe the actions of a neighbor ( |[T5l . |[T6ll[T7l . |[T8l ). 
not to exchange information, as in our approach. In distributed ECFP, players will use the ancillary 
communication network to exchange estimates of the centroid distribution. Information exchange is 
considered non-strategic; players do not try to manipulate the information they send for strategic gain. 

While the assumption of this communication structure is made for algorithmic convenience, we note 
that it makes some degree of sense applied to social and economic settings as well. People engaged in 
repeated play (e.g., a daily commute) might certainly 'talk with their friends,' exchanging opinions on 
the average behavior of the aggregate in the hope of optimizing their next round of play. 

Our main contributions are threefold: 
Main Contribution 1: We present empirical centroid fictitious play (ECFP), a variant of FP which 
is well suited to large-scale games. We show that ECFP converges in terms of average empirical 
distribution to a subset of the mixed strategy Nash equilibria, which we call the consensus equilibria. 
Convergence results are proven for games with identical permutation-invariant utility functions and can 
be extended to the larger class of games known as potential games |[T9l . with the restriction that the 
potential function be permutation invariant. We emphasize that the mode of convergence used in this 



paper (convergence in average empirical frequency) is different from the more conventional convergence 
in empirical frequency. 

The concept of a consensus equilibrium is closely related to that of a symmetric equilibrium. The 
existence of symmetric equilibrium in finite normal form games was first proven by Nash lEUl in the 
same work where the concept of Nash equilibrium was originally presented. In general, a symmetric 
equilibrium is a Nash equilibrium that is invariant under automorphisms of the game. A consensus 
equilibrium, on the other hand, is a Nash equilibrium in which all players use an identical strategy. In 
the case of a symmetric game, the two concepts coincide. 

Main Contribution 2: We present the distributed ECFP algorithm, an implementation of ECFP 
in which agent policy update depends only on local neighborhood information exchange. We prove 
convergence of the algorithm to the set of consensus equilibria. Moreover, this convergence guarantees 
that each agent obtains an accurate estimate of the limiting equilibrium strategy. 

Main Contribution 3: We present the distributed FP algorithm, an implementation of the well- 
known FP algorithm where agent policy update depends only on local neighborhood information 
exchange. We prove convergence to the set of Nash equilibria for games with identical interests. Dis- 
tributed FP is computationally equivalent to traditional FP and therefore inherits some of the drawbacks 
for large-scale implementation. However, we consider this contribution valuable because of the archetypal 
role of FP as a learning algorithm. This result shows that our methodology can be generalized and might 
allow for distributed implementation of other FP variants. 

A. Related Work 

An overview of the subject of learning in games is found in ||2T1 . Many large-scale learning algorithms 
exist that are not based on FP, including no-regret algorithms dl22l . ll23l ). aspiration learning ll24l . and 
other model-free approaches ( Il25l . ll26l . [|27i '). These learning algorithms tend to be fundamentally different 
than FP in that they do not track past actions of other players. 

Variants of FP have been proposed for two player games ( ll28l , ||29l , |[30l ). ll2Ti ). generally aimed at 
improving various aspects of the two player algorithm (i.e., faster convergence, convergence in specific 
games, etc.). 

Sampled FP OTTl addresses the problem of computational complexity of FP in large-scale games by 
using a Monte Carlo method to estimate the best response. Although computationally simple in the initial 
steps of the algorithm, the number of samples required to ensure convergence grows without bound. 

Dynamic FP ifTTl applies principles of dynamic feedback from control theory to improve the conver- 
gence properties of a continuous-time version of FP. The algorithm is shown to be stable around some 



Nash equilibria where traditional FP is unstable. While the results generalize to multi-player games, there 
is no mitigation of the information gathering problem. In ll32l . a similar algorithm utilizing only payoff 
based dynamics is presented. Similar stability results are shown when the class of games is restricted to 
games with a pairwise utility structure. 

Joint strategy FP 11331 is shown to converge for generalized ordinal potential games. Players track the 
utility each of their actions would have generated in the previous round, and then use a simple recursion 
to update the predicted utility for each action in the subsequent round. Actions are chosen by maximizing 
the predicted utility. In joint strategy FP, the information tracking problem is mitigated by requiring agents 
to track only the information germane to the computation of the predicted utility for actions of interest. 
No information gathering scheme is explicitly defined; players are assumed to have full access to the 
necessary information at all times. In distributed ECFP, the information gathering scheme is explicitly 
defined via a preassigned (but arbitrary) communication graph and convergence results are demonstrated 
when interagent communication is restricted to local neighborhoods conformant to the graph. 

There is a relationship between the ancillary communication structure presented in this paper and 
the class of state-based potential games Il34l . ll35l . In such games, the action space is augmented with 
an additional state space. Payoffs are based on both the action and the state. The states may take on 
various interpretations; in [35] a player's state space consists of a value and a set of estimates of other 
players' values. Players are permitted to observe the state space of neighbors and update their estimates 
accordingly. 

Along these same lines, in distributed ECFP we assume players have a personal estimate of the centroid 
distribution which they update by observing the estimates of neighbors. While there is an interesting 
relationship between the concepts, there are fundamental and important differences, the most important 
of which is the overarching objective of each work. In II341 . II351 the primary objective is to design a 
distributed optimization framework amenable to existing game-theoretic learning algorithms, whereas the 
objective of this paper is to design a learning algorithm that is well suited to large-scale games. 

The remainder of the paper is organized as follows: Section [I] sets up notation to be used in the 
subsequent development. The set of consensus equilibria is defined and the classical (centralized) FP 
algorithm is reviewed in the same section. Section [III] introduces ECFP as a low-information-overhead 
repeated-play alternative to FP for learning consensus equilibria in multi-agent games. A fully distributed 
implementation of the proposed ECFP, the distributed ECFP, in multi-agent scenarios in which agent 
information dynamics is restricted to communication over a preassigned sparse communication network 
is presented and analyzed in section JV] In section [V] we discuss generalizations of ECFP, including a 



distributed implementation of the traditional FP algorithm. In section [Vj we demonstrate an application 
of distributed ECFP in a traffic routing scenario. Finally, section IVIII concludes the paper. 

II. Preliminaries 

Let T be a normal form game with a set of players N = {1, . . . , n}. The set of actions, or pure strategies, 
for player i is given by Y{ = {1, 2, ... , mj, and the set of joint actions is given by Y n = niLi ^- The 
utility function of player i is given by m (y) : Y n — > R. 

The set of mixed strategies for player i is given by Aj = {p G M. mi : YlT=lP(^) = -*■}' tne m *~ 
simplex. A mixed strategy pi G Aj may be thought of as a probability distribution from which player 
i samples to choose an action. In this context, a pure strategy may be thought of as a vertex of the 
probability simplex. With a slight abuse of notation, we denote the set of actions, or pure strategies, for 
player i, in this context, using the notation Ai = {ei, e\, . . . e mi }, where rrii is the number of strategies 
available to player i, and tj is the jth canonical vector in M. m \ The set of joint mixed strategies is given 
by A" = nr=i At> and the set of joint actions is given by A n = niLi ^-i- A joint mixed strategy is 
given by the n-tuple (pi,P2, ■ ■ ■ ,Pn), Pi G Aj. In this paper, we often make the assumption that players 
use identical action spaces, in which case we drop the subscript on individual action spaces and write 
A = Aj V», A = Ai V», and Y = Y t = {1, 2, . . . , m} Mi. 

The mixed utility function for player i is given by the multilinear function Ui(-) : A™ — > M. 

Ui(pi,...,p n ) := ^2ui(y)pi(yi)...p n (y n ). (1) 

Note that the mixed utility Ui(p) may be interpreted as the expected value of Ui(y) given that the players' 
mixed strategies are independent. For convenience the notation Ui(p) will often be written as £/j(pj,p_j) 
where p-i G Aj is the mixed strategy for player i, and p_j indicates the joint mixed strategy for all 
players other than i. This paper will deal mostly with games with identical utility functions such that 
Uiip) = Uj(p) Vi, j; in such cases we drop the subscript and write U{p) = Ui(p) Mi. 

Let {di(t)}^ 1 be a sequence of actions for player i, where aj(t) G Ai Vt. Let {a(t)}'£L 1 be the 
associated sequence of actions a(t) G A n . Note that aj(t) G M. m ; when necessary, we denote the kth 
element of the vector a(t) by a(t,k). Let qi(t) be the normalized histogram (empirical distribution) of 
the actions of player i up to time t, i.e., qi(t) = |^ s=1 fli(s). Similarly, q(t) = jXls=i a ( s ) ^ s ^ e 
joint empirical distribution corresponding to the joint actions of the players up to time t. The sequence 
of distributions {q(t)}^ 1 is often called a belief sequence. 



A mixed strategy p is a Nash equilibrium of F if, for each player i, Ui(p) > Ui(gi,p-i) V<?j E A*. We 
define the set of Nash equilibria as 

K = {p E A™ : Ui(p) > Uitgup-t) % E At, Vi}, 

and the subset of consensus equilibria as 

C = {p£ K : pi = p 2 = ■ ■ ■ = p n }- 
The set of e-Nash equilibria is given by 

K e = {peA n : Ui(p) + e> U^p^) V ft E A,, Vi}, (2) 

and the set of e-consensus equilibria as 

C e = {p E A n : Ui(p) + e> U^p-i) Vg t E A i; Vi, 

PI =P2 = ■■■ =Pn}- (3) 

The distance of a distribution p E A n from C is given by d(p,C) = mi{\\p — g*\\ : g* £ C}. Throughout 
the paper || • || denotes the standard £2 Euclidean norm unless otherwise specified. For S > we denote 
the set 

B S {C) = { P eA n : Pl =p 2 = ...= Pn , dip, C) < 5}. 

Unless stated otherwise, we will restrict attention to games with identical permutation-invariant utilities; 
formally, we assume: 

A. 1. All players use the same strategy space. 

A. 2. The players ' utility functions are identical and permutation invariant. 

Note that under these assumptions, the set of consensus equilibria is known to be nonempty lTT2l . 

Let 

1 n 

?(*) = -!>(*) 

be the average empirical distribution. Let #"(£) = {q(t),q(t), . . . ,q{t)) E A n denote the mixed strategy 
where all players use the empirical average as their individual strategy. For convenience in notation we 
sometimes write U(q~i(t),q~-i), where the subscripts indicate the strategy q(t) is being used by player i 
or by all players except i respectively. The vector q(t) will be extremely important in the exposition of 
ECFP. 



A. Fictitious Play 

In fictitious play the best response of player i at time t is given b)|f 

Vi(q{t)) ■= max U(oii,q-i(t)). (4) 

In other words this means that at time t each player chooses a best response by assuming that the empirical 
distributions of the other players accurately represent their respective mixed strategies. A fictitious play 
process is a sequence {a(t)}'£L 1 G A n such that, 

Vi(q{t)) = U(ai{t + l),q-i{t)), VI 

A fictitious play process is said to converge in empirical frequency if lim^oo d(q(t), K) = 0. In Q it 
was shown that a fictitious play process converges in the above sense for games satisfying Afl] - Afl] 

III. Empirical Centroid Fictitious Play 

The difficulty of implementing FP in large-scale distributed games can be understood by analyzing the 
FP best response calculation given in (0J). The two major problems with FP are 

(i) Each player must have access to the marginal empirical frequency distributions of the other n — 1 
players in order to compute a best response. It is impractical for each agent to track the actions of all 
other agents. 

(ii) The computational complexity of computing the mixed utility given an n-dimensional probability 
density function grows exponentially with the size of the game. 

The key idea of ECFP is a modification of the best response function which mitigates both of these 
problems. Consider a scenario where a player knows the structure of the game but does not have the 
ability to track the individual actions, ai(t), of any single player. Rather, a player is only able to track 
the average action, a(t) := - YH=i a «(0> °f tne collective and therefore has access only to the average 
empirical distribution, q(t) = i Ya=i %(*)• m ECFP a player computes a best response by assuming the 
average empirical distribution accurately represents the mixed strategy of each of the other players and 
maximizes her utility accordingly. The best response in ECFP is given by 

v?{q(t))-= m3xU( ai ,q-i(t)), (5) 

a-i&Ai 

v™(p) : A — > M.. We use the superscript rn to distinguish between the modified best response © and 
the traditional FP best response (@). In this scheme the information tracking problem of FP is mitigated 

3 A maximizing an in © exists since the Ai is assumed to be finite. 



by requiring players to track only the centroid distribution, a vector whose dimension is invariant to the 
number of players in the game. The problem of computational complexity is resolved in a less direct 
manner. By exploiting the symmetry inherent in the ECFP best response calculation, the computation 
can often be greatly simplified. For example, in the distributed traffic routing scenario of section |VI] 
the complexity of the best response calculation is reduced to constant time complexity in terms of the 
number of players. 

In an ideal ECFP process, the best response calculation is given by ©. We consider a more general 
case where players do not have access to q(t) directly. Instead, player i has access to q~i(t) G M. m , an 
approximation or estimate of q(t). Let £j(£) = \\qi(t) — q(t)\\ be the error in player i's estimate. We make 
the following assumption about the decay rate of the error: 

A. 3. £i{t) = O(^Sl), for some r > 0. 

This particular decay rate will appear naturally in the analysis of ECFP in a distributed setting. A 
sequence of actions {a(t)}^. 1 is an empirical centroid fictitious play process if 

vr(q i (t)) = U(a i (t+l),q. i (t)), (6) 

where q~i(t) is the (n — l)-tuple q-i(t) = (&(£), . . . , q~i(t)) and the initial action cij(l) is chosen arbitrarily 
for all i. We note that the traditional definition of the mixed utility U{p), given in £T|), is defined over 
the domain A n . The restriction of the domain to A n is not necessitated by the definition; rather, it is 
a byproduct of the traditional approach dealing only with mixed strategies p G A". The approximated 
empirical distribution q~i(t) G M m , however, is permitted to be outside the simplex, A, and may even take 
negative values. In this case we retain the definition of U(p), given by £Q), but extend the domain to the 
set of all n-tuples of vectors in M m . This adjustment of the traditional definition expands the domain to 
an unbounded set, but for practical purposes, we note that assumption Af3]implies {qi{t)}t>o belongs to 
a compact set. 

In ©, each player best responds using <&(£) (her personal estimate of q(t)) as the assumed mixed 
strategy for the other n — 1 players. In ECFP, players learn a strategy which is a consensus Nash 
equilibrium strategy. We summarize the result in the following Theorem: 

Theorem 1. Let {a(t)}f2. 1 be an ECFP process such that AjT] - Ajj] hold. Then d(q n (t),C) — ^ as 

t — > oo. 

Proof: Let a(t) = ^X)?=i°i(*)' where a(t) G A,o»(t) G A. Let a n (t) G A n be the n-tuple 

(a(t),...,a(t)). 
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Note that for t > 1 



Using {/J) we write 



q"(t + 1) = f{t) + — (a»(t + 1) - g™ (t)) • 



(7) 



u(t(t 4i)) = p( rw + ^y («"(* + 1) - re*)) 



Applying the multilinear! ty of U(-), we obtain 

1 n 
I7(g"(t + 1)) = U(q n (t)) + — E U W + !)' ?-<(*)) 

1 ™ 



*+ 



i=l 



where we have explicitly written the first order terms of the expansion and collected the remaining terms 

in £(£ + 1). Note that the number of second order terms in the above expansion is finite and the terms are 

uniformly bounded since max p6 A» |^(p)| < °°- Hence, there exists a positive constant M (independent 

of t) large enough such that \((t + 1| < M.(t + 1)~ 2 for all t. Thus, 

1 n 
U{q n (t + 1)) > U(cf(t)) + TTJ T, U &(* + !)' ?"*(*)) 

1 V^rr/ / x /NX M 

The permutation invariance and multilinearity of U(-) permits a rearranging of terms. We use the notation 
[ctj(i)]j to indicate the action of player j at time t being played by player i. 

n 



i=l 



Ef 



i=l 

n , n 



,q-i(t) 



3=1 

E~E^(m* +1 )]*>m*)) 
»=i i=i 

E^E^(m*+i)],-,m*)) 



=1 3=1 



3=1 
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Thus, 

U(cf(t + l))-U((f(t)) + 



(t + iy 

i n 

i=l 
1 U 

Let Lj(i + 1) = v|" (&(*)) - t/ (<n(t + 1), g_i(t)). Substituting in Li(t + 1), © becomes 

U (q n (t + 1))-U (q n (t)) + -f-i + ^ E L ^ + ^ 
n 

^rn E w &(*)) - f 7 («(*),?-*(*))) 

(9) 






t + 1 
where 



a t+1 :=E K m (&(*)) - 17 &(*),?-<(*))) • 



Note that [/(•) is multilinear and therefore locally Lipschitz continuous. As noted earlier, assumption 

AfS] implies that {<&(£) }t>i is contained in a compact subset of M m . Therefore, there exists a positive 

constant .K" (independent of t), such that |{7(a,(i + l),q-i(t)) — U{ai{t + l),g_j(t))| < K||(aj(t + 

l),g_i(t)) - (oj(t + l),g_i(t))||, for all i. By assumption AfJl \\q-i(t) - q~-i(t)\\ = O(^), and hence 

\U( ai (t + l),g_i(t)) - U(a z (t + l),g_i(t))| = O(^), which, by ©, implies L;(t) = O(^). In 

particular, 2t=2 ~t < 7? is bounded above by some B G R for all T > 1. Summing over 1 < £ < T 

in©, 

T , , T n 



lit + lf 



umr + 1)) - W (D) + E ^ + E E ^ 

T 

, t + 1" 



i=l 



Note that ^4=1 7TTTF * s summa ble; therefore all terms on the left hand side are bounded above for all 
T > 1, and hence it follows that 

t=2 

is bounded above by some Bel, for all T > 2. Let /3 m = £? =1 [*>?*(«(*)) ~ u («"(*))]. and note mat ' 
by definition of «?*(.), A > for all t. By Lemma H in the appendix, |vf (&(£)) -vf(q{t))\ = O 
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Thus, 

\a t -Pt\ =0 \-^r- 

and hence by Lemma [31 J2t=2 t ^ °° c o nver g es as T — > oo. By Lemma [3] it then follows that 

/3 2 + /?3 + ■ ■ ■ + ffr 

T^o T 

Subsequently, by Lemma [6] we obtain for every e > 0, 



#{1 < t < T : g"(t) c £ } = 
r->oo T 

By Lemma [U this is equivalent to 

lim #{1 < t < T : g"(t) g ^(C)} = 
T^oo T 

for every 5 > 0. Finally, by Lemma |9j we obtain d(g"(t), C) — >• as t — > oo. ■ 

We emphasize that Theorem [Q shows that the n-tuple of the average empirical distribution converges 
to C, that is, (i(<f l (£), C) — > 0. This is not the same as the more traditional definition of convergence in 
empirical frequency, 

d(g(t),C)-»0ast-K». (10) 

The practical meaning of Theorem [T] is that players do in fact learn a consensus equilibrium strategy. It is 
true that each player i has access only to the distribution qi(t). However, the tuple of these distributions 
(qi(t), &(£), ... , q n (t)) also converges asymptotically to the set of consensus equilibria, i.e., 

d((&(<),&(t),... ,?„(*)), C)-»0, 

by AfJ] Therefore, player i has direct access to her portion of the convergent joint strategy. Taken as a 
whole, player i learns a strategy which is a Nash (consensus) equilibrium with respect to the strategies 
learned by other players. 

IV. Distributed ECFP 
A. Distributed Problem Formulation 

The result given in Theorem [T] is powerful in that it guarantees convergence to a Nash (consensus) 
equilibrium given only an estimate of the average empirical distribution. We wish to implement the 
algorithm in a distributed setting where information exchange is restricted to a local neighborhood of 
each agent. 

Consider the following problem formulation: Players are engaged in an n-player repeated game. 
Players are endowed with an ancillary communication network which we represent using the graph 
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G = (V,E), where a vertex represents a player, and an edge represents the ability of two players to 
exchange information. Players are permitted to exchange information with neighbors once per iteration 
of the repeated game. Note this implies there is only one time scale for both communication and game 
play. The information available to a player is restricted to knowledge of her own actions, and whatever 
information her neighbors choose to share with her. Information exchange is non-strategic; players do 
not try to manipulate the information they send for strategic gain. 

We maintain assumptions Afl] and Af2] given in section [TTJ and we add an additional assumption 
pertaining to the communication network. 

A. 4. The graph G = (V, E) modeling the ancillary inter-agent communication network is connected^. 

The following two matrices are defined to facilitate a more compact description of the algorithm. Let 

Q(t) := (qi(t) to(t) ... q n (t)) T eR nxm 

be the matrix containing players' empirical distributions. Let qi(t) € M m be player i's estimate of 

q(t) £ W n . Let 

Q(t) := (qi(t) hit) ... g„(t)) T GM nxm 



be a matrix containing players' estimates of the average empirical distribution. Let q(t) £ Rxl be the 
n-tuple (q\(t), . . . ,q n (t)). The tuple q(t) will be important in distributed ECFP; in particular we will 
prove that q(t) converges to the set of consensus equilibria. 

B. Distributed ECFP Algorithm 

Initialize 
(i) At time t = 1, each player i chooses an arbitrary initial action aj(l). The initial empirical distribution 
for player i is given by qi(l) = ai{\). Player i initializes her local estimate of the empirical distribution 

as 

ft(i)= y, <«(i) (ii) 

jeau{i} 
where flj is the set of neighbors of player i and Wij is a weighting constant. 

Iterate 
(ii) At each time t > 1, player i computes the set of best responses using qi(t) as the assumed mixed 

4 A graph is said to be connected if there exists a path (possibly multi-hop) between any pair of vertices. 
5 As noted in section [TTTJ the estimate q(t) may be outside the set A. 
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strategy for each of the n — 1 other players. The next action 

<H(t+ 1) G {arg max 17 (a», ?_»(*))} (12) 

is played according to the best response calculation. In the event of multiple pure strategy best responses, 
any of the maximizing actions in ([121 may be chosen arbitrarily. The local empirical distribution <&(£+ 1) 
is updated to reflect the action taken, i.e., 

qi(t + 1) = qiit) + J^(ai{t + 1) - ?*(<))■ 

(iii) Subsequently each player i computes a new estimate of the network-average empirical distribution 
using the following update rule: 

fc(t + l)= J] w «(3i(*) + ^(t + l)-^(t)), (13) 

where fij is the set of neighbors of player i, and Wij is a weighting constant^ 
The update in (fT3l) is represented in more compact notation as 



Q(t + l) = W[Q(t) + Q(t + l)-Q(t)J, (14) 

where W € IR nxri is a weighting matrix with entries Wij. We assume W satisfies the following 
assumption: 

A. 5. The weight matrix W is an nx n matrix that is doubly stochastic, aperiodic, and irreducible, with 
sparsity conforming to the communication graph G. 

Note that given assumption Aj4] (G is a connected graph), it is always possible to find a matrix W 
satisfying these conditions (see ||36*10 . 



C. Distributed ECFP: Main Result 

We refer to any sequence of actions {a(t)}^ 1 which can be attained using the distributed ECFP 
algorithm of section IIV-BI as a distributed ECFP process. In a distributed ECFP process, players learn a 
consensus equilibrium strategy in a setting where information exchange is restricted to a local neighbor- 
hood of each agent. The result is summarized in the following Theorem. 

6 Note that the set fit U {i} in the summation indicates that player i uses its own (local) information and that of her neighbors 
to update her estimate. The update rule is clearly distributed as information exchange is restricted to neighboring players only. 
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Theorem 2. Let {ai(t)}^ 1 be a distributed ECFP process such that assumptions Aj7] Aj2] Aj4] and 
Aj5] hold. Then d(q(t),C) — > as t — )■ oo. /n particular, the agent estimates qi(t)'s reach asymptotic 
consensus, i.e. d(<ji(t),qj(t)) — )■ aw £ — > oo for each pair (i,j) of agents. Moreover, the agents achieve 
asymptotic strategy learning, m ?/ze sense that d((qi(t)) n , C) — > as t — > oo /or all i = 1, . . . ,n. 

This result implies that the n-tuple (qi(t), . . . ,q n {t)) converges to the set C; since qi(t) is available 
to player i, player i learns the component of the consensus equilibrium strategy relevant to her. 

Proof: We would like to apply the results of Theorem[T]to the distributed ECFP process. Assumptions 
Afl] and Af2] hold in a distributed ECFP process by assumption. By Lemma [2] the error in a distributed 
ECFP process decays as \\qi(t) — q(t)\\ = O (-^J. thus Af3] is satisfied (with r = 1), meeting all 
necessary assumptions for Theorem [Q Applying Theorem [Q d(g n (i),C) — > as t — > oo. By Lemma |2] 
we obtain, \\qi(t) — q(t)\\ — > as t — > 0, and the result d(q(t), C) — > as t — > oo follows. ■ 

Again, we emphasize that this mode of convergence is not the same as the more traditional convergence 
in empirical frequency, given in (fTOl . 

V. Generalizations 

A. ECFP in Potential Games 

The assumption Af2] of identical permutation invariant utility functions can be relaxed in lieu of the 
following broader assumption: 

A. 6. The game T is an exact potential game with a permutation invariant potential function. 

A game T is an exact potential game if there exists some function P{y) : Y n — > R, such that 

ui(y'i,y-i) -Ui(y",y-i) 
= Pfay-i) - P{yly-i) Vi G NMtdl G Yi. 
The function P{y) is called a potential function for T. The generalized form of Theorem [T]is as follows: 

Theorem 3. Let {a(t)} ( ^ =1 be an ECFP process such that Am (identical actions spaces), AfJjfe^t) = 
O O&l) \/i, for some r > 0), and Affihold. Then d(q n (t), C) ->■ as t -> oo. 

Proof: Let Ti = (N, Y, {C/j}j e 7v) be an exact potential game with potential function P. Let T2 = 
( N, Y, {C/jjjgAr ) be a game with the same set of players and actions as r 1? but with all players using 
P as their utility function (f/j = P, Vi). Let Cr 1 and Cr 2 be the set of consensus equilibria in T\ 
and T2 respectively. Let gri(i), 9r 2 (t) be the average empirical distributions corresponding to ECFP 
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processes in T\ and T2 respectively. Note that the set of consensus equilibria in Ti and T2 coincide fT9l . 
Also note that Ti and 1?2 are best response equivalent J37| , therefore a valid ECFP process for Ti is 
a valid ECFP process for T2, and vice versa. T2 is a game with identical action spaces and identical 
permutation-invariant utility functions and therefore falls within the purview of Theorem Q] By Theorem 
CD d(qfi (t),Cr 2 ) — > 0. By best response equivalence, any valid ECFP process in T± is a valid ECFP 
process in T2, therefore d(qfi (t),Cr 2 ) — > 0. Since Cr 1 and Cr 2 coincide, d(q^ (t),Ci) — ^ 0. ■ 

Potential games are studied in |T9l . A game which admits an exact potential function is known as 
an exact potential game. The class of exact potential games includes congestion games iTTTTl . Conges- 
tion games have many useful applications in economics and engineering. We present an example of a 
congestion game in the distributed traffic routing example presented in the applications section. 

B. Distributed Implementation of Traditional FP 

In the traditional FP best response (H}, players are required to have precise knowledge of the empirical 
distribution of all other players at time t in order to pick the next stage action at time t + 1. This 
requirement is tantamount to requiring that players have global knowledge and at first glance seems to 
disqualify the algorithm for distributed implementation. However, in this section we show that traditional 
FP can in fact be implemented in a distributed setting by using the same methodology employed to adapt 
ECFP for a distributed setting. The algorithm, which we call distributed FP (DFP), allows players to 
exchange information with neighbors via an ancillary communication network in order to estimate the 
empirical distribution of play. Players then pick a best response using the estimated empirical distribution 
and the traditional FP best response rule (H). The essential feature of the algorithm, the best response 
rule, is identical to that of FP; thus, the algorithm inherits the large-scale implementation problems 
of computational complexity and high information overhead associated with FP as mentioned in the 
introduction. Nevertheless, FP is an archetypal learning algorithm and demonstrating that our methodology 
extends to FP not only provides a rigorous demonstration of how this classic learning algorithm might 
be implemented in a distributed setting, but also suggests that a similar methodology might be employed 
to implement other FP variants in a distributed setting. 

Consider the following problem setup: players are engaged in a repeated n-player game. Players are 
endowed with an ancillary communication network represented by the graph G = (V,E), as described 
in section IIV-AI Players are permitted to exchange information with neighbors once per iteration of the 
repeated game. 

The key aspect of Theorem [T] which permits a distributed implementation of ECFP is embodied in 
assumption Af3] If the error in players' approximations of the average empirical distribution decays 
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quickly enough, then the process will converge to a consensus equilibrium. In the case of traditional FP 
we can show convergence given a similar assumption: 

A. 7. For all pairs (i,j) of players, \\qUt) — qj(t)\\ = O l-^f-j, where qj(t) is the empirical distribution 
of player j, and q l At) is the estimate which player i maintains of qj(t). 

Formally, we say the sequence {a(t)}^ 1 is an asymptotically empirical FP process if 

U(qi(t),...qj_ l (t),a i (t + l),qt +1 (t),...,qi(t)) 

= v i (qi(t),...,<? n (t)) 

where Vi(-) is the FP best response defined in ©, and the initial action aj(l) is chosen arbitrarily for all 
i. We also make the following assumption, familiar from Q: 

A. 8. r is a game with identical interests. 

Under these assumptions, a FP process can be shown to converge to the set of Nash equilibria, as 
stated in the following Theorem. 

Theorem 4. Let {a(i)}^. 1 be an approximately empirical FP process such that Aj7] - AjS] hold. Then 

d(q(t),K) — > as t — > oo. 

This can be seen as a generalization of the proof of the fictitious play property for games with identical 
interests found in Q. The technical details of this proof follow very closely with the proof of Theorem 
[TJ and are omitted here for brevity. 

With this result in mind, we construct a distributed FP algorithm using the same methodology as the 
distributed ECFP algorithm of section IIV-B I We show that the distributed FP algorithm converges in 
empirical frequency to the set of Nash equilibria. 

1) Distributed FP algorithm: We introduce some notation to facilitate a compact description of the 
algorithm. Let qj(t) € M™ 3 be the empirical distribution of player j. Let qUt) € M mj be the estimate 
which player i maintains for qj(t). Let s = n^2 k&N mk Let 

«*(*) = ((^i(<)) r (^(t)) T -(^(t)f) T eR' 

be the vector of player i's estimates. Let q'^t) be an augmented vector representing the empirical 
distribution of player i such that 

q' l (t) = (0...(n.q i (t)) T ...0) T eR s . 
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The augmented vector q^(t) matches the general structure of <f (i), but in the place of q\{t) we substitute 
in n ■ qi(t) (a scaled copy of the true empirical distribution) and set all other entries to zero. Let 

Q{t)=(q\t)q\t) ■■■ ? n (tfer xs , 

and let 

Q'(t) = (q[(t)q' 2 (t) ••• q> n (t)) T eR nXs . 

Initialize 
(i) At time t = 1, each player i takes an arbitrary initial action aj(l). The initial empirical distribution for 
player i is given by <Zi(l) = aj(l). Player i initializes her local estimate of the full empirical distribution 

as 

«r(i)= Yl ^'(i) 

jeQiU{i} 
where 0^ is the set of neighbors of player i and Wij is a weighting constant. 

(ii) At each time t > 1, player i computes the set of best responses using qUt) as the assumed mixed 
strategy for player j. The next action 

{arg max [/ (gi(t), . . . , g|_l(*)>«i> 9i+i(*) ■ ■ -,&(*)))} (15) 

is played according to the best response calculation. In the event of multiple pure strategy best responses, 
any of the maximizing actions in (031 ) may be chosen arbitrarily. The local empirical distribution qi(t + l) 
is updated to reflect the action taken, i.e., 

qi (t + 1) = qi (t) + t^{<H(t + 1) - ©(*)). 

(iii) Subsequently each player i computes a new estimate of the empirical distribution using the 
following update rule: 

<f(t+i)= y, ^j^'c*) +#+!)-«;•(*))» d 6 ) 

where ilj is the set of neighbors of player i, and Wjj is a weighting constantly 

7 Note that the set f2, U {i} in the summation indicates that player i uses its own (local) information and that of her neighbors 
to update her estimate. The update rule is clearly distributed as information exchange is restricted to neighboring players only. 
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The update in (1161) is represented in more compact notation as 

Q(t + 1) = W (Q(t) + Q'(t + 1) - Q'(t) 



where W E W nxn is a weighting matrix with entries Wij and satisfying assumption AJU 

2) Convergence of Distributed FP: We refer to any sequence of actions {a(t)}^2. 1 that can be attained 
using the distributed FP algorithm of section IV-B 1 1 as a distributed FP process. In distributed FP, 
players learn a Nash equilibrium strategy in a setting where information exchange is restricted to a 
local neighborhood of each player. The result is summarized in the following theorem. 

Theorem 5. Let {a^)}^ be a distributed FP process such that Aj4] Aj5] Aj7] and Aj#] are satisfied. 
Then play converges in terms of empirical frequency to the set of Nash equilibria; that is, d(q(t),K) — > 
as t — >■ oo. 

Proof Sketch. We would like to apply the results of Theorem^\to the distributed FP process. Assumption 
Aj$] holds in a distributed FP process by assumption. Using a slight variation of Lemma |2] it can be 
shown that the error in a distributed FP process decays as \\q l At) — qj{t)\\ = O ( -^f- ) Vi, j E N; thus, 
AJ7] is satisfied, meeting all necessary assumptions for Theorem^ Applying Theorem^ d(q(t),K) — > 

as t — > oo. 

VI. Applications 

A. Distributed Traffic Routing 

We simulated distributed ECFP in a traffic routing scenario with 200 cars traversing a traffic network 
of five roads. This scenario is an instance of a congestion game. Any congestion game can be shown 
to satisfy assumption Aj6l and therefore falls within the purview of ECFP. All vehicles start from the 
same location and have the same destination. The number of cars on road r for a joint strategy y E Y n 
is given by a r {y). The delay on road r for a joint strategy y is given by the cubic cost function 

Cr{y) = a 3 a r (yf + a 2 a r (y) 2 + a\a r {y) + a , 

where the coefficients a& E R are arbitrary. The utility for player i is given by ui{y) = —c yt (y), the 
negative of the travel delay experienced by player i. In this particular simulation, the weights Wij (see 
(fl4l) ) were chosen according to the Metropolis-Hastings rule lf38l . A graph of the communication network 
used for simulations is shown in Fig. [TJ On the surface, the ECFP best response calculation §5$ appears 
to have the same complexity as the FP best response calculation (01). However, the symmetry inherent 
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Fig. 1. Randomly generated sparse communication graph. 

in the ECFP best response calculation can lead to simplifications. Given the cost functions used in this 
simulation, the ECFP best response can be simplified to an expression that is independent of the number 
of players. This fact is not unique to the cubic cost functions used here; it holds for games of this 
form with polynomial cost functions of any degree. The key results of ECFP are based around showing 
d(q(t), C) — > 0. An important practical implication of this result is that q(t) G K £t where e t — > and K £t 
is an et Nash equilibrium as defined in ©. Fig. [2] shows a plot of the minimum e t such that q(t) G K £t . 
This plot shows that et tends to zero which is consistent with d(q(t),C) — > 0, the claim of our main 
result in Theorem |2] For the traffic routing application, this means that players concurrently learn a 
mixed strategy e t consensus equilibrium (see ©), where e t can be made arbitrarily small. Once et is 
sufficiently small for the game designers' purposes, the algorithm can be terminated. The cost functions 
used to model road delay were specifically chosen as cubic polynomials in order to model a situation 
in which there may exist multiple consensus equilibria; distributed ECFP is particulary relevant to such 
situations since it can be used not only to compute a consensus equilibrium, but also to ensure that 
players agree on which consensus equilibrium is reached. 

VII. Conclusions 

We have introduced a variant of the well-known FP algorithm which we call empirical centroid fictitious 
play (ECFP). Rather than track and respond to the empirical distribution of each player, as in FP, ECFP 
tracks the centroid of the marginal empirical distributions and computes a best response with respect 
to this same quantity. The computational complexity of computing a best response is mitigated by the 
introduction of symmetry into the best response calculation, and the information tracking problem is 
mitigated by requiring players to track a quantity which is invariant to the number of players in the game. 
ECFP is shown to converge to the set of consensus equilibria, a subset of the Nash equilibria where all 
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Fig. 2. The minimum e t on each iteration t such that /(£) G K Et . The trend e< — > is consistent with convergence to the set 
of consensus equilibria (i.e. d(f(t), C) — > as t — > oo) as stated in Theorem [2] 



players use an identical strategy, for potential games with permutation invariant potential functions. 

We have provided a distributed implementation of ECFP which depends only on local information 
exchange, and we have proven convergence of the algorithm to the set of consensus equilibria. Further- 
more, we have shown that the same approach can be used to formulate a distributed implementation of 
the traditional fictitious play algorithm. 

An interesting future research direction would be to investigate if ECFP can be shown to converge to 
a consensus equilibrium for the more general class of symmetric games. It would also be of interest to 
investigate if ECFP can be shown to converge to the more general equilibrium concept of a symmetric 
equilibrium in games in which a consensus equilibrium may not exist. 
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VIII. Appendix 
A. Distributed averaging in dynamic networks 

This appendix concerns topics in distributed consensus in networks where node values are dynamic 
quantities. The results of this section are used to prove convergence of the distributed algorithms presented 
in sections IIV-BI and IV-B1I Results in this section are similar to results on distributed averaging in 
networks with additive changes in node values and information dynamics in 11391 . HOI . |41|. For a survey 
of traditional consensus and gossip algorithms, the reader may refer to ll42l . B3~l . ll36l . 

Consider a network of n nodes connected through a communication graph G = (V,E). The graph is 
assumed to be connected. Let Xi(t) Gibe the value of node i at time t, and let x{t) £ W 1 be the vector 
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of values at all nodes. The goal is for each node to track the instantaneous average x(t) = - Ya=i x «(*)> 
x(t) G R, given that the value at each node Xi(t) is time varying. Let <5j(£) = Xi(t + 1) — £j(t) be the 
change in the value at node i, and 5(t) = x(t + l) — x(t) be the vector of changes at all nodes, 5(t) G M. n . 
Suppose the magnitude of the change at time t is bounded by \5i(t)\ = \xi(t + 1) — Xi(t)\ < e(t) \/i. We 
make the following assumption: 

A. 9. The sequence {e(t)}^ is monotone non-increasing. 

Let Xi(t) G IR be the estimate of x(t) at node i and let x(t) G M n be the vector of estimates. We make 
the following assumption pertaining to the initial error in players' estimates. 

A. 10. x<(0) - x(0) = Vi 

Let the average be estimated using the update rule 

x(t + 1) = W (x(t) + x(t + 1) - x(t)) , (17) 

where the matrix W G M. nxn is aperiodic, irreducible and doubly stochastic with spars ity conforming to 
G. The following Lemma gives a bound for the error in the estimates of x(t). 

Lemma 1. Let the sequence {x(t)}^ 1 be computed according to (1171 ) such that assumptions AJ3 Aj5] 
and A\10\ hold and let the incremental change in x(t) be bounded according to assumption Aj9] Then 
the error at any time t is bounded by, 

\\x(t) - x(t)l\\ <^^e avg {t), 
where X = sup n | , and e avg (t) = j Yl e(r) is the time average o/{e(r)} r ~ . 

Proof: Let e(t) = x(t) — x(t)l be the vector of errors in each players estimate of x(t), where 1 
denotes the nxl vector of all ones. Let 






n 
i 



Using the relation (fTTT ) and the properties of doubly stochastic matrices, the vector of errors may be 
written recursively as, 

e(t + l) = W{e(t) + £(t)) (18) 
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where £(t) = S(t) - 5(t)l. Note that 

Mt)\ = \Si(t)-6i(t)\ 

< \6i(t)\ + \6i(t)\ 

< 2e(t), 
and 

n n 

wmf = £&(*)) a < E 4e W 2 = 4nc (*) 2 - < 19 > 

*=1 1=1 

Using (fT8T ). the error e(t) can be rewritten as a function of £(£) and e(0), 

£ 

e (t + 1) = J2 W r+l £{t -r) + W t+1 e(0). 

r=Q 

Using this relationship we establish an upper bound on the error, 

t 

\\e(t + 1)|| = || E W r+l i{t -r) + W t+l e(Q)\\ 

r=0 
t 



<j2\\w r+1 t(t 



r=0 

I 



<E AP+1 i^*- r )ii' (20) 

r=0 

where we have employed assumption AllOl e(0) = 0. Applying ( fT9l ) in d20l ), we get 

t 
||e(t + l)|| <^\ r+l 2^ie{t-r). 



r=0 



4-1 



Recall that e avg (t) = j J2 e ( r ) * s tne t ^ rne average of the sequence {e(t)} up to time t, and note that 



r=0 



given our assumptions on W, it holds that A < 1 (see IT36T0 . Hence, by invoking Lemma [TOl we have that 

t 
\\e(t + l)\\<^\ r+1 2VTie avg (t + l) 

r=0 

( i_a' +1 \ 
= f A — - J 2y/ne avg (t + 1) 



2^ 



< -r^e avg {t + 1). 



This gives us the desired upper bound for the error, 



\x(t)-x(t)\\ = \\e(t)\\<^^e avg (t). 
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Lemma 2. Let {a(t)}t>i be a distributed ECFP process as defined in section \IV-B\ (see equations (114b - 



([T2]) ). Then \\qi(t) — q(t)\\ = O(-^j-), where q(t) is the average empirical distribution and qi(t) is player 
i's estimate of q(t). 

Proof: Recall we use the second argument, k, to index the components of the vector qi(t) £ R m . 
Noting that 

qi (t + 1) = qi {t) + — (oi(< + 1) - qi (t)) , 

it follows that the maximum incremental change for any single value in the vector qi(t) is bounded by 

1 



\qi(t+l,k)- qi (t,k)\ 



< 



t + 1 
1 

t+T 



(ai(t+l,k)- qi {t,k)) 



Thus the incremental change in any players empirical distribution is bounded as \qi(t + 1, k) — qt(t, k)\ < 
e(t), where e(t) = A-. Note that the distributed ECFP process (fl4l ) is updated column-wise (each column 
corresponds to an action k) using an update rule equivalent to (fTTT i of Lemma [Q Also note that, column- 
wise, all necessary conditions of Lemma [Q are satisnedjj and specifically, we have e(t) = ^-. Thus we 
apply Lemma Q] column-wise to Q and Q(t) of (fT4b . where x(t) of Lemma Q] corresponds to the A:'th 
column of Q(t), and -X"(i) of Lemma [j] corresponds to the fc'fh column of Q(t), and obtain 






q{t,k)l\\<^- x e avg (t)=0 ( 



\q n {t,k)J 
where e at , g (t) = \ £* =1 ^ = O (^). Thus, |«(t, fc) - q(t,k)\ = O 



k = 1, . . . ,m, \/i, and 



hence 



«(t)-*(t)|| = o(^), Vi. 



B. Intermediate Results 



Lemma 3. Suppose the sum — oo< ^ ^ = 5 < oo converges, then lim °i+ga±^+°£ — q. 



i=l 



T->oo 



8 The assumption of zero initial error (Allot is satisfied since the initialization of qt(l) in jl lb is equivalent to letting qi(Q) = 0, 
g;(0) = for all i in {TJl. 
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oo T 

Proof: By Kronecker's Lemma O, -oo <J2^- = S<oo^ lim ^ ^ k^t = 0, which 



implies that 



fe=i T ^°° fc=i 



Oi + ...+Ot 
urn — = (J. 



Lemma 4. For i £ N, let {g_j(t)}^ 1 G A_j ancf {r_j(t)}^ :1 G A_j be sequences such that \\q_i{t) 
r-i{t)\\ = 
©■ Then 



O ( -p- I , r > 0. Let U(p) : A" ->I k f/ie (multilinear) mixed utility function defined in 



max U(pi,q-i(t)) - max C/(pj,r_j(t))| = O ' 



PiSA; p.eA, V £ 



,ej 



Proof : Let ('^ G A_j and £"■ G A„j.Letp* = argmaXp^A, U(pi,C'_i) andp** = argmaxp.gA; U(p l ,, 
U(-) is multilinear and is therefore Lipschitz continuous over the domain A™. Let K be the Lipschitz 
constant for U(-) such that \U(x) — U(y)\ < K\\x — y\\ for x,y G A n . By Lipschitz continuity, it holds 
that 

uip^Cd^uip^Cd + KWC-i-CiW 

<U(p",£i)+K\\C-i-<Zil (2D 

and thus U(p*,(-i) ~ U{p**,(-i) < K\\('-i - C-ill- By a symmetric argument to (ED, we also establish 
U(p**,&) ~ U(p*,d) < K\\Ci ~ dl thus 

|C/(p*,C^)-C/(p**,C-i)|<^IK^-C-ill- 

From the above it follows that, 

| max U(j)i,q-i(t)) - max [/(>;, r_j(t))| 

PiEAi PiGAi 



implying the result 



< K\\q_i(t) - r_i(t) 



max U(pi,q-i(t)) - max C/(pj,r_j(t))| = O ' 



PiSAi P;GA; V t 



'Note that such p* and p** exist, as [/(■) is continuous and the maximization set is compact. 



2S 



'log* 



Lemma 5. Suppose \at — h\ = 0(-^p-),r > 0, bt > and J2t=i ^f < B is bounded above by B E 
/or all T > . Then EJ t= i t converges as T — > oo. 



<5,= ^ 



Proof: Let 

< 

b t - a t if 6 t > at 
otherwise. 

It follows that <J t > and 6 t < Ot+5 t . By hypothesis, |a t -6 t | = O(^), which implies that 5 t = O(^). 
It follows that, 

T T 

sr^h < y^ gj + gt 
t=i t=i 

T T 

t=\ t=i 



oo 



oo T 

Since EJ ^ < B is bounded above, EJ ^ < oo converges, and 6 t > 0, it follows that EJ y < oo 

t=i t=i t=i 

converges as T — > oo. ■ 

n 

Lemma 6. Let a t = EJ [t>™ («(*)) ~ ^ (?"(*))]> ^ en lim ai+ 'f +aT = ° impfe f/zaf, /or every e > 0, 



j=i 



T->oo 
nm #{1 < t < r : q n (t) j C £ } = 



Proof: Let e > be given. By definition, 

?(t) €C e ^ v?{q{t)) ~ U(q n (t)) < e Vt. 



(22) 



The utility function [/"(•) is assumed to be permutation invariant for all players, so an equivalent statement 
to d22]) is, 

n 

<f(t) eC £ ^Yl W(?(t)) " WW)] < ne. 



i=l 



Let 



h={ 



1, if ^ > ne 
0, otherwise. 
Note that b t = «• g"(t) E C £ and b t = l^q n {t)$ C £ , thus 

#{1 < t < T : ^(£) E' C £ } 6i + ... + frr 
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Note also that a t > 0. Clearly, 



bi + . . . + br 1 oi + ... + ot 
T - ^ T ' 



implying lira bl+, " +feT = 0, from which the desired result, 

T— >oo 

Hm #{1 < t < T : <t(t) j C £ } = Q 
t-*oo T 

follows. ■ 

Lemma 7. Let p n = (pi,p2, ■ ■ ■ ,Pn) be a consensus distribution such that p\ = pi . . . = p n and let 
5 > be given. Then there exists an e > such that p n ^ B$(C) implies p n ^ C s . 

Proof: Let B$(C) denote the complement of the set B$(C). Suppose for the sake of contradiction 
that there does not exist an e > such that p n £ B$(C) =4- p n ^ C £ . Then there exists some p n G Bg(C) 
such that p n G C e Ve > 0. This implies that p n G C, or equivalently, d(p n , C) = 0. But the hypothesis 
p n G Bg(C) implies that d(p n ,C) > 5, a contradiction. ■ 

Lemma 8. lim tUM^fM^l = ofor aU £ > /m/ , //e5 f/M? lim #{i<*<r:q"(t)ga*(C)} = Q/or a// 
£>0. 

Proof: Suppose 

Hm #{1 < t < T : g"(f) j C £ ] = 

T-fOO T 



for all e > 0, but there exists some <5 > such that lim sup — — ' 9 T = a > 0. By Lemma 

T->oo 

|71 there exists an e' > such that g"(t) ^ S 5 (C) => ^{t) ^ C E >, which implies that 



#{l<*<T:r(t)G^} 

> #{1 < * < T : q n {t) i B 5 (C)}. 

Implying 

lim sup — > Q 

T-s-oo T 

for some e' > 0, a contradiction. ■ 

Lemma 9. lim #{1 - t - T:? " (t) ^ (c ' )} = 0/or all 5 > iif^pZ/es lim dffft), C) = 0. 

T— >oo £->oo 

Proof: Our proof follows the methodology of 0, but is adapted to show convergence to the set of 
consensus equilibria. Let {a(t)}%l 1 be an empirical centroid fictitious play process, and let {q Il (t)}^2 =1 be 
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the associated average belief process. Let 5 > be given, let M = max \\p' — p"\\, and let r/ < 2 s+m ■ 
The hypothesis lim ^ — ' q J> '* ^ '' = implies that there exists an integer To such that for every 

T— s-oo 

T>T 

#{l<t<T: <f{t) i B S (C)} < r,T. (23) 

We claim that for every T > T , g"(T) G Bm(C). 



Suppose T > T and q"-(T) $ B 2S (C). Then for T < t < T+j^T, q"-(t) £ B S (C). In order to verify 



this note that, 



\q n (T + l)-q n (T)\\< 



T + l 



and for T < t < T + -^T 



\<t{t)-ct(T)\\<M T - 

*■ ^ « 



s=T 


s 


5 

M ~< 77 T 

S + M 


1 

i 

f 


s M 

S + M 






S. 





Since <f (i) £ flj(C) for T < t < T + j^jT, we have 



#{1 £ ' s T + JTM r : 5 " (4) * B * (c)} £ JTm t 



26 + M\ S + M J '\ 6 + M' 
contradicting (|23]>. Therefore, for any 5 > 0, there exists a T such that, for all T > T , d(q ri (T), C) < 5, 
i.e. lim d(q n (t),C) =0. ■ 

Lemma 10. Lef {at}f = i, at > be a monotone non-increasing sequence and let {b t }f =1 , bt > be a 
monotone non-decreasing sequence. Let b avg = y^ t=1 i)[ be the mean of {bt}J =1 . Then £^t=i a tbt < 






Proof: We represent the sequences using vectors a, b G M T and prove the result in this framework. 
Let 6(1) < 6(2) < . . . < b(T) and a(l) > o(2) > . . . > a(T). We will prove the result for a, strictly 
increasing first and then show that it generalizes to the non-decreasing case. Assume without loss of 
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generality that l T a = l T b = 1. Consider the optimization problem 

max a T /3 (24) 

/3>0 

1 T /3=1 

Pi<...<Pt 

The solution to (|24l is /3* = ^1. To see this, assume for the sake of contradiction that (3* = ^1 is not 
a solution. Let /3 be a solution such that /3(i + 1) < /3(i). Create a new vector /3 which is a duplicate of 
/3, with the exception that, 



ft i + 1) = ft = W + ')-W) 



Then 



a T /3 - a T /3 = a* (/3(i) - /3(i)) + a m (/3(i + 1) - 0(i + 1)) 

= a(i)(<5) + a(i + l)(-<5) 

= 6(a(i) - a(i + 1)) 

>0, 

a contradiction. We now consider relaxing the assumption on a; we let a be non-decreasing. Let {a t }^ 1 , a t G 
M T be an sequence of approximations of a such that at is strictly increasing (a t (l) < a<(2) < . . . at{T)) 
and lim^^oo a — a 4 = 0. Then we have afb < af^l. Taking the limit as t — > oo, 



